The RNA-seq data in this project originates from sequencing done at the Genomics Centre of QMUL. The samples came from Prof. Filipe Cabreiro's lab (then UCL, now Imperial) from normal and neurexin-deficient worms - the work was mostly carried out by Peter Cooke (then Phd student). Other members in Filipe's lab had helped out at the time with supervision and RNA extraction. The sequencing centre did not achieve good ribosomal depletion as it transpired that they did not have the right kit for C.elegans. Hence, only medium/high expressed genes are seen in these data and ribosomal RNA reads must be removed before analysing the data further.
Aine did a differential gene expression analysis based on counting reads using HTseq count and doing differential expresion with DESEq2 and limma. However, if I remember well, she missed out ~ half the reads due to a problem in one of her shell scripts (when pooling together the lanes, I think she pulled only two out of the four available).
It is now recognised that gene expression analysis should be based on aggregating the results of transcript-based analysis. Below is a list of links that describe how this should be done:
The data comprise 64 FASTQ files in total. There are 8 samples, each split into 4 lanes and each being sequenced in paired-end mode, hence there are: 8 x 4 x 2= 64 files.All data is from reversely-stranded libraries i.e. read 2 is in the same direction as the transcript on the genome. - 4 normal samples, names starting with "N" (biological replicates) - 4 neurexin-deficient samples, names starting with "SG" (biological replicates).
All samples originated from 1-day old adult worms.
Below is an outline of the steps to analyse the RNA-seq data from Aine's project, starting with raw fastq data and ending with mapped reads to the C. elegans transcriptome using kallisto.
All code was run on the departmental server thoth. Where the variable $PROJECT is used below it refers to the following directory: /d/in16/u/ubcg71a/research/filipe/ .
Initial fastqc data quality analysis on raw data was carried out by both Aine and repeated by Krzysztof at some point. Results can be found here: /d/in13/u/kszkop01/worm_neurexin/fastqc/
Had I run it myself I would have used my bash script run_fastqc.sh for running fastQC on multiple files:
#!/bin/bash
# Runs FASTQC on all fastq.gz files found in given directory
# Run as:
# ./run_fastqc dir_of_fastq_files
timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1 #all output will be logged to logfile
FASTQC_DIR="/s/software/fastqc/v0.11.8/FastQC/fastqc"
echo "Running FASTQC using executable: $FASTQC_DIR"
for fastq_file in $1/*.fastq.gz;
do
echo "Fastq file: $fastq_file"
$FASTQC_DIR $fastq_file -o .
done
I summarised K's FASTQC results using multiQC:
module use -s /s/mm/modules
module load python/v2
multiqc /d/in13/u/kszkop01/worm_neurexin/fastqc/
Results are under: $PROJECT/multiqc_on_raw/
All samples have sequences of a single length (76 bp).
Diagnostic plots from FASTQC summarised with multiQC (in all these plots, files with R1 are coloured red, files with R2 are blue):
Alt text
Alt text
I trimmed (adapters and poly(A) tails) and quality-filtered the reads using the program fastp :
Raw fastq data is under: $PROJECT/data/Data/170519_NS500784_0235_AHT2JYBGX2/Demulti/Data/Intensities/BaseCalls/GC-IN-6725/
To remove poly(A) tails and adapters, I ran the script under $PROJECT/fastp:
nohup ./run_fastp.sh ../data/Data/170519_NS500784_0235_AHT2JYBGX2/Demulti/Data/Intensities/BaseCalls/GC-IN-6725/ >& run_fastp.out &
Below is the run_fastp.sh script:
#!/bin/bash
# Runs fastp in PE mode for all .fastq.gz files in a directory
# Run as:
# ./run_fastp.sh directory_of_samples
timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1 #all output will be logged to logfile
FASTP_EXEC="/d/in7/s/fastp/fastp"
DIR=$1
ADAPTER_FILE="./adapters_all.fa"
echo "Running fastp using executable: $FASTP_EXEC"
for file in `ls $DIR/*_R1_*.fastq.gz`;
do
sample=${file/$DIR\//}
sample=${sample/_R1_001.fastq.gz/}
echo "Sample= $sample"
$FASTP_EXEC -i "$DIR/$sample"_R1_001.fastq.gz \
-I "$DIR/$sample"_R2_001.fastq.gz \
-o "$sample"_R1_001_trimmed.fastq.gz \
-O "$sample"_R2_001_trimmed.fastq.gz \
--adapter_fasta $ADAPTER_FILE \
-l 30 --trim_poly_x \
-h "$sample"_fastp.html -j "$sample"_fastp.json
done
NOTE: fastp by default uses the following quality filtering options:
Re-ran fastQC with the files in the fastp directory: ./run_fastp.sh ../fastp Results in the directory: $PROJECT/fastqc_after_fastp
Then re-ran multiqc and moved the .html file locally to save the plots shown below:
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Results are under $PROJECT/fastqc_after_fastp/multiqc
Seeing that there were no strong lane effects, I merged all 4 lanes for each sample (this could have been done following mapping but there really was no evidence here to suggest a lane effect). The new files are in the $PROJECT/fastp_merged directory and the script run_fastq_merger.sh is given below:
#!/bin/bash
# Merges all fastq files for the same sample and same read orientation into single fastqc
# Run as:
# ./run_fastq_merger.sh directory_of_fastq_files sample_name sample_number
timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1 #all output will be logged to logfile
MERGED_DIR="/d/in16/u/ubcg71a/research/filipe/fastp_merged/"
cd $1
sample=$2
samplenum=$3
echo $(pwd)
echo Merging for sample $sample
cat "$sample"*"$samplenum"*R1_001*fastq.gz > $MERGED_DIR/"$sample"_"$samplenum"_R1.fastq.gz
cat "$sample"*"$samplenum"*R2_001*fastq.gz > $MERGED_DIR/"$sample"_"$samplenum"_R2.fastq.gz
./run_fastq_merger.sh ../fastp/ N1 S5
./run_fastq_merger.sh ../fastp/ N2 S6
./run_fastq_merger.sh ../fastp/ N3 S7
./run_fastq_merger.sh ../fastp/ N4 S8
./run_fastq_merger.sh ../fastp/ SG1 S1
./run_fastq_merger.sh ../fastp/ SG2 S2
./run_fastq_merger.sh ../fastp/ SG3 S3
./run_fastq_merger.sh ../fastp/ SG4 S4
Before carrying out the analysis with kallisto/sleuth, we will map the reads using STAR and carry out some exploratory analysis of the data.
Aine's work showed a huge number of reads mapping to rRNA.Here, we will allow them to map and exclude them later. During the kallisto run, we will remove them from the transcriptome before mapping to save hassle.
First, we need to create an index of the genome with STAR:
module load star/v2.7
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir . --genomeFastaFiles ../Caenorhabditis_elegans.WBcel235.dna.toplevel.fa --sjdbGTFfile ../Caenorhabditis_elegans.WBcel235.99.gtf --genomeSAindexNbases 12
#(note that parameter genomeSAindexNbases must be scaled down to 12 from the default 14 because the genome
#is smaller than the human genome)
#The genome index is currently under: /d/in16/u/ubcg71a/research/genomes/c.elegans/ensembl_99/STAR_INDEX/
#Once the genome index is created we can map the files using a bash script:
nohup ./run_star.sh ../fastp/ list_files > & run_star.out &
#where list_files contains a list of all samples
#N1_S5
#N2_S6
#N4_S8
#N3_S7
#SG1_S1
#SG2_S2
#SG3_S3
#SG4_S4
The script run_star.sh is given below:
#!/bin/bash
# Runs STAR mapping on all samples in a given directory
# Samples must end in .fastq.gz and are given as a list in a file (second argument)
# Run as:
# run_star.sh directory_of_fastq_files list_of_samples
timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1 #all output will be logged to logfile
dir=$1
shift
#set location of executables
STAR_EXEC="/s/software/STAR/bin/Linux_x86_64/STAR"
numProc=5 #number of processors/threads to be used
genomeIndex="/d/in16/u/ubcg71a/research/genomes/c.elegans/ensembl_99/STAR_INDEX/" #directory for genome index files
for sample in `cat $1`;
do
echo "Running STAR on sample $sample (paired-end reads) ..."
inputFile1="$dir$sample"_R1.fastq.gz
inputFile2="$dir$sample"_R2.fastq.gz
STAR --runThreadN $numProc \
--runMode alignReads \
--outSAMtype BAM SortedByCoordinate \
--readFilesCommand zcat \
--genomeDir $genomeIndex \
--outFileNamePrefix "$sample"_ \
--readFilesIn $inputFile1 $inputFile2 \
--outReadsUnmapped Fastx \
--quantMode GeneCounts
done
echo "All done!"
STAR produces separate files with the unmapped reads (which we can use, for example, to map to circular RNA transcripts later) and it also produces gene counts. Ideally one should rely on transcript counts for differential gene expression; here we will use gene counts for the exploratory part of the analysis only.
I summarised the STAR output by running multiqc again. Results are under $PROJECT/star_mapping/multiqc/
I start by defining my own version of DESeq2's plotPCA() function so that I can define which PCs I will be plotting
#define my own pcaplot to allow other components besides PC1 and PC2 to be plotted
#the function itself is copied from DESEq2
myPlotPCA <- function (x, intgroup = "condition", pc1 = 1, pc2 =2, ntop = 500, returnData = FALSE)
{
rv <- genefilter::rowVars(assay(x)) #if using with SummarizedExperiment objects
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
length(rv)))]
pca <- prcomp(t(assay(x)[select, ]))
percentVar <- pca$sdev^2/sum(pca$sdev^2)
if (!all(intgroup %in% names(colData(x)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}
intgroup.df <- as.data.frame(colData(x)[, intgroup, drop = FALSE])
group <- if (length(intgroup) > 1) {
factor(apply(intgroup.df, 1, paste, collapse = " : "))
}
else {
colData(x)[[intgroup]]
}
d <- data.frame(PC1 = pca$x[, pc1], PC2 = pca$x[, pc2], group = group,
intgroup.df, names = colnames(x))
if (returnData) {
attr(d, "percentVar") <- percentVar[c(pc1,pc2)]
return(d)
}
ggplot(data = d,
aes_string(x = "PC1", y = "PC2", color = "group"))+
geom_point(size=3) +
xlab(paste0("PC",pc1,": ", round(percentVar[pc1] * 100), "% variance")) +
ylab(paste0("PC",pc2,": ", round(percentVar[pc2] * 100), "% variance"))
}
gtf <- rtracklayer::import('genomes/Caenorhabditis_elegans.WBcel235.99.gtf')
gtf_df=as.data.frame(gtf)
t2g <- data.frame(target_id = gtf_df$transcript_id,
ens_gene = gtf_df$gene_id,
ext_gene = gtf_df$gene_name)
t2g <- t2g %>% drop_na()
t2g <- dplyr::distinct(t2g)
The gene counts are in the .tab files of STAR. I moved those locally and use code from: http://biostars.org/p/241602 to read in counts so they can be used with DESeq2.
library(ggplot2)
library(DESeq2)
## Loading required package: SummarizedExperiment
## Loading required package: DelayedArray
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
##
## aperm, apply, rowsum
#First, read in the gene counts from STAR
number<- 4 #reverse-stranded library so counts should be in column 4
ff <- list.files( path = "./star_mapping", pattern = "*ReadsPerGene.out.tab$", full.names = TRUE )
counts.files <- lapply( ff, read.table, skip = 4 )
starcounts <- as.data.frame( sapply( counts.files, function(x) x[ , number ] ) )
ff <- gsub( "[_]ReadsPerGene[.]out[.]tab", "", ff )
ff <- gsub( "[.]/star_mapping/", "", ff )
colnames(starcounts) <- ff
row.names(starcounts) <- counts.files[[1]]$V1
rm(counts.files)
coldata <- read.table("metadata.txt", header=T)
rownames(coldata) <- coldata[,1]
coldata
## sample condition
## N1_S5 N1_S5 control
## N2_S6 N2_S6 control
## N3_S7 N3_S7 control
## N4_S8 N4_S8 control
## SG1_S1 SG1_S1 neurexin
## SG2_S2 SG2_S2 neurexin
## SG3_S3 SG3_S3 neurexin
## SG4_S4 SG4_S4 neurexin
#Check row and column names match in the same order
rownames(coldata) == colnames(starcounts)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
dds = DESeqDataSetFromMatrix(countData = starcounts,
colData = coldata,
design = ~ condition )
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds
## class: DESeqDataSet
## dim: 46904 8
## metadata(1): version
## assays(1): counts
## rownames(46904): WBGene00197333 WBGene00198386 ... WBGene00010967
## WBGene00014473
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#pre-filter to remove genes with too few counts
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds
## class: DESeqDataSet
## dim: 14163 8
## metadata(1): version
## assays(1): counts
## rownames(14163): WBGene00015153 WBGene00002061 ... WBGene00010966
## WBGene00010967
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#check quickly the distribution of the other counts
quantile(rowSums(counts(dds)))
## 0% 25% 50% 75% 100%
## 10 51 364 2168 83054802
#and in more detail
quantile(rowSums(counts(dds)), probs=seq(0.9,1,0.005))
## 90% 90.5% 91% 91.5% 92% 92.5%
## 5149.00 5294.22 5505.68 5756.92 6024.16 6250.85
## 93% 93.5% 94% 94.5% 95% 95.5%
## 6603.22 6954.88 7311.76 7756.54 8130.50 8767.13
## 96% 96.5% 97% 97.5% 98% 98.5%
## 9355.20 10231.64 11092.54 12633.10 14999.88 18921.49
## 99% 99.5% 100%
## 26026.46 38536.55 83054802.00
vsd <- vst(dds, blind=TRUE)
plotPCA(vsd, intgroup=c("condition"))
This initial PCA looks promising with good separation of the samples. However, the rRNAs remain in the counts as shown above and they could be skewing the results.
We need to remove all counts corresponding to rRNA transcript_biotype reads (there are 23) - the genome/transcriptome files are from Ensembl and can be found in this directory: /d/in16/u/ubcg71a/research/genomes/c.elegans
zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep biotype:rRNA > list_of_biotype_rRNA_transcripts.txt
#keep gene names only
awk '{print $4}' list_of_biotype_rRNA_transcripts.txt | sed 's/gene://' | sed 's/\..//' > list_biotype_rRNA_genes.txt
We will now move this list of rRNA genes locally and use it to filter out the unwanted counts.
library("pheatmap")
#clean up
rm(ff, coldata, starcounts, dds, vsd)
number<- 4 #reverse-stranded library so counts should be in column 4
ff <- list.files( path = "./star_mapping", pattern = "*ReadsPerGene.out.tab$", full.names = TRUE )
counts.files <- lapply( ff, read.table, skip = 4 )
starcounts <- as.data.frame( sapply( counts.files, function(x) x[ , number ] ) )
ff <- gsub( "[_]ReadsPerGene[.]out[.]tab", "", ff )
ff <- gsub( "[.]/star_mapping/", "", ff )
colnames(starcounts) <- ff
row.names(starcounts) <- counts.files[[1]]$V1
rm(counts.files)
coldata <- read.table("metadata.txt", header=T)
rownames(coldata) <- coldata[,1]
coldata
## sample condition
## N1_S5 N1_S5 control
## N2_S6 N2_S6 control
## N3_S7 N3_S7 control
## N4_S8 N4_S8 control
## SG1_S1 SG1_S1 neurexin
## SG2_S2 SG2_S2 neurexin
## SG3_S3 SG3_S3 neurexin
## SG4_S4 SG4_S4 neurexin
#Check row and column names match in the same order
rownames(coldata) == colnames(starcounts)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#now remove the rRNAs from the counts table
rRNAgenes <- read.table(file="list_biotype_rRNA_genes.txt", header=F)
rRNAgenes
## V1
## 1 WBGene00004677
## 2 WBGene00004512
## 3 WBGene00004513
## 4 WBGene00004567
## 5 WBGene00004622
## 6 WBGene00014454
## 7 WBGene00014472
## 8 WBGene00014621
## 9 WBGene00077465
## 10 WBGene00077466
## 11 WBGene00077467
## 12 WBGene00077468
## 13 WBGene00077469
## 14 WBGene00077470
## 15 WBGene00077471
## 16 WBGene00077472
## 17 WBGene00077473
## 18 WBGene00077474
## 19 WBGene00077475
## 20 WBGene00077476
## 21 WBGene00077477
## 22 WBGene00189966
## 23 WBGene00235197
starcounts.norRNA <- starcounts[!row.names(starcounts)%in%rRNAgenes[,1],]
dim(starcounts)
## [1] 46904 8
dim(starcounts.norRNA)
## [1] 46881 8
dim(rRNAgenes)
## [1] 23 1
rm(starcounts) #clean up
dds = DESeqDataSetFromMatrix(countData = starcounts.norRNA,
colData = coldata,
design = ~ condition )
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds
## class: DESeqDataSet
## dim: 46881 8
## metadata(1): version
## assays(1): counts
## rownames(46881): WBGene00197333 WBGene00198386 ... WBGene00010967
## WBGene00014473
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#pre-filter to remove genes with too few counts
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds
## class: DESeqDataSet
## dim: 14156 8
## metadata(1): version
## assays(1): counts
## rownames(14156): WBGene00015153 WBGene00002061 ... WBGene00010966
## WBGene00010967
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#check quickly the distribution of the other counts
quantile(rowSums(counts(dds)))
## 0% 25% 50% 75% 100%
## 10.00 51.00 363.00 2162.25 240054.00
#and in more detail
quantile(rowSums(counts(dds)), probs=seq(0.9,1,0.005))
## 90% 90.5% 91% 91.5% 92% 92.5% 93%
## 5137.500 5279.275 5485.100 5740.000 6012.600 6230.750 6569.650
## 93.5% 94% 94.5% 95% 95.5% 96% 96.5%
## 6916.775 7291.800 7736.800 8108.000 8734.175 9302.200 10166.250
## 97% 97.5% 98% 98.5% 99% 99.5% 100%
## 11065.100 12575.500 14494.600 18757.400 25663.900 37461.975 240054.000
#show the genes that still have very large counts
i<- rowSums(counts(dds)) >= 100000
assay(dds)[i,]
## N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4
## WBGene00000915 13312 12611 10603 15602 12740 10132 12254 14820
## WBGene00006929 18386 19388 18132 20877 18804 15398 18484 23763
## WBGene00006926 27452 26867 26905 28783 20753 17046 19889 25582
## WBGene00006930 29295 29066 28874 31287 30451 24642 29171 37268
## WBGene00001167 11896 10835 10303 14872 18855 12461 18973 21002
## WBGene00001168 21195 20607 20916 26436 23787 17016 25938 33677
#Check correlation between raw gene expression vectors for all samples (without log transformation)
pairs(assay(dds))
#and following log transformation
pairs(log(assay(dds)+0.5)) #add 0.5 as pseudocount
cor.mat <- cor(assay(dds))
cor.mat
## N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3
## N1_S5 1.0000000 0.9876924 0.9951246 0.9802646 0.9524706 0.9692726 0.9512352
## N2_S6 0.9876924 1.0000000 0.9900583 0.9869876 0.9472874 0.9698440 0.9392737
## N3_S7 0.9951246 0.9900583 1.0000000 0.9796024 0.9443960 0.9618507 0.9429706
## N4_S8 0.9802646 0.9869876 0.9796024 1.0000000 0.9705380 0.9718217 0.9649730
## SG1_S1 0.9524706 0.9472874 0.9443960 0.9705380 1.0000000 0.9881489 0.9968161
## SG2_S2 0.9692726 0.9698440 0.9618507 0.9718217 0.9881489 1.0000000 0.9817127
## SG3_S3 0.9512352 0.9392737 0.9429706 0.9649730 0.9968161 0.9817127 1.0000000
## SG4_S4 0.9579343 0.9485628 0.9532682 0.9725760 0.9954057 0.9829288 0.9972798
## SG4_S4
## N1_S5 0.9579343
## N2_S6 0.9485628
## N3_S7 0.9532682
## N4_S8 0.9725760
## SG1_S1 0.9954057
## SG2_S2 0.9829288
## SG3_S3 0.9972798
## SG4_S4 1.0000000
pheatmap(cor.mat, legend=T)
#Transform the counts
vsd <- vst(dds, blind=TRUE)
plotPCA(vsd, intgroup=c("condition"))
#As expected the PCA is not changed much because we only removed a few genes
#try rlog
rld <- rlog(dds, blind = TRUE)
plotPCA(rld, intgroup=c("condition"))
#we can improve the plot a bit by highlighting the replicate number in different shape
pcaData <- plotPCA(vsd, intgroup=c("condition", "sample"), returnData=TRUE)
pcaData$replicate <- sub(pattern= ".*(.)_.*",
replacement="\\1",
perl=TRUE, fixed=FALSE,
x=pcaData$sample)
pcaData
## PC1 PC2 group condition sample name replicate
## N1_S5 -10.515292 3.9809806 control:N1_S5 control N1_S5 N1_S5 1
## N2_S6 -11.343047 -4.3937220 control:N2_S6 control N2_S6 N2_S6 2
## N3_S7 -10.735343 3.8215511 control:N3_S7 control N3_S7 N3_S7 3
## N4_S8 -8.883619 -3.2672766 control:N4_S8 control N4_S8 N4_S8 4
## SG1_S1 10.795760 -0.7681986 neurexin:SG1_S1 neurexin SG1_S1 SG1_S1 1
## SG2_S2 8.886207 -2.7062952 neurexin:SG2_S2 neurexin SG2_S2 SG2_S2 2
## SG3_S3 11.160175 1.7500821 neurexin:SG3_S3 neurexin SG3_S3 SG3_S3 3
## SG4_S4 10.635157 1.5828787 neurexin:SG4_S4 neurexin SG4_S4 SG4_S4 4
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=condition, shape=replicate)) +
geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed()
pcaData <- myPlotPCA(vsd, pc1= 2, pc2= 3, intgroup=c("condition", "sample"), returnData=TRUE)
pcaData$replicate <- sub(pattern= ".*(.)_.*",
replacement="\\1",
perl=TRUE, fixed=FALSE,
x=pcaData$sample)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=condition, shape=replicate)) +
geom_point(size=3) +
xlab(paste0("PC2: ",percentVar[1],"% variance")) +
ylab(paste0("PC3: ",percentVar[2],"% variance")) +
coord_fixed()
#Check correlation between transformed gene expression vectors for all samples
pairs(assay(vsd))
cor.mat <- cor(assay(vsd))
cor.mat
## N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3
## N1_S5 1.0000000 0.9897970 0.9926442 0.9895144 0.9778167 0.9812096 0.9779625
## N2_S6 0.9897970 1.0000000 0.9904951 0.9911871 0.9752235 0.9819031 0.9726985
## N3_S7 0.9926442 0.9904951 1.0000000 0.9884828 0.9758348 0.9803706 0.9760524
## N4_S8 0.9895144 0.9911871 0.9884828 1.0000000 0.9836709 0.9840315 0.9817861
## SG1_S1 0.9778167 0.9752235 0.9758348 0.9836709 1.0000000 0.9919673 0.9944142
## SG2_S2 0.9812096 0.9819031 0.9803706 0.9840315 0.9919673 1.0000000 0.9898898
## SG3_S3 0.9779625 0.9726985 0.9760524 0.9817861 0.9944142 0.9898898 1.0000000
## SG4_S4 0.9789728 0.9741121 0.9774286 0.9830008 0.9947647 0.9907127 0.9951929
## SG4_S4
## N1_S5 0.9789728
## N2_S6 0.9741121
## N3_S7 0.9774286
## N4_S8 0.9830008
## SG1_S1 0.9947647
## SG2_S2 0.9907127
## SG3_S3 0.9951929
## SG4_S4 1.0000000
pheatmap(cor.mat, legend=T)
Next, we carry out differential gene expression based on the gene counts from STAR mapping. Note that this approach is now pretty much deprecated and replaced by transcript-based analysis summarised at the gene level so we won't really analyse the results here but add them here for completion.
#BiocManager::install("apeglm")
library(apeglm) #to use in the lfcShrink function
dds <- DESeq(dds) #estimates size factors and dispersions and fits the model
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
res
## log2 fold change (MLE): condition neurexin vs control
## Wald test p-value: condition neurexin vs control
## DataFrame with 14156 rows and 6 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## WBGene00015153 1.40026331133511 1.23369435011502 1.25556206618441
## WBGene00002061 1023.39358270591 -0.0561810129305647 0.111894106778005
## WBGene00001177 236.878096958743 0.645083433822109 0.130496798969703
## WBGene00021412 24.8706101313319 0.267994653237935 0.287508181787133
## WBGene00018976 6.03204079332923 0.0830540521723233 0.559216117300581
## ... ... ... ...
## WBGene00010963 646.235482854105 0.0560344400948453 0.532154321210568
## WBGene00010964 6610.73359986534 -0.301756164128778 0.339749774394959
## WBGene00010965 698.469898642392 -0.0714973118809531 0.592928262802426
## WBGene00010966 54.8204097454485 0.488445620632377 0.207379835476911
## WBGene00010967 2519.35135960664 0.508179713782424 0.122171381252483
## stat pvalue padj
## <numeric> <numeric> <numeric>
## WBGene00015153 0.98258332530239 0.325812554464328 NA
## WBGene00002061 -0.502090901373623 0.615603580096765 0.755603864950982
## WBGene00001177 4.94328932904997 7.68153514674162e-07 1.49941766583249e-05
## WBGene00021412 0.93212878872558 0.35126997168092 0.521797355147636
## WBGene00018976 0.148518702524594 0.881933427153093 0.933540011973405
## ... ... ... ...
## WBGene00010963 0.105297350526771 0.916139865263089 0.954440625434673
## WBGene00010964 -0.888171786621956 0.374448352556327 0.544673615403436
## WBGene00010965 -0.120583410112763 0.904021009760152 0.946326013043526
## WBGene00010966 2.35531877778328 0.0185068216532182 0.0590552767545967
## WBGene00010967 4.1595642823437 3.18855275260321e-05 0.000347870583450949
#next we want to shrink the LFC to help visualisation and ranking of genes
resultsNames(dds)
## [1] "Intercept" "condition_neurexin_vs_control"
resLFC <- lfcShrink(dds, coef="condition_neurexin_vs_control", type="apeglm")
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
## Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
## sequence count data: removing the noise and preserving large differences.
## Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
resLFC
## log2 fold change (MAP): condition neurexin vs control
## Wald test p-value: condition neurexin vs control
## DataFrame with 14156 rows and 5 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## WBGene00015153 1.40026331133511 0.0865693105221461 0.334873921280774
## WBGene00002061 1023.39358270591 -0.0501595379594478 0.106187975503463
## WBGene00001177 236.878096958743 0.6101360636599 0.131419234864905
## WBGene00021412 24.8706101313319 0.16047086366021 0.231989548013496
## WBGene00018976 6.03204079332923 0.0219340702385293 0.285509651993327
## ... ... ... ...
## WBGene00010963 646.235482854105 0.0165071893007835 0.281593365688285
## WBGene00010964 6610.73359986534 -0.15473391499059 0.25612671632153
## WBGene00010965 698.469898642392 -0.0161784771837534 0.289572855203541
## WBGene00010966 54.8204097454485 0.398418895186059 0.203231809685816
## WBGene00010967 2519.35135960664 0.476542635901622 0.122242317594528
## pvalue padj
## <numeric> <numeric>
## WBGene00015153 0.325812554464328 NA
## WBGene00002061 0.615603580096765 0.755603864950982
## WBGene00001177 7.68153514674162e-07 1.49941766583249e-05
## WBGene00021412 0.35126997168092 0.521797355147636
## WBGene00018976 0.881933427153093 0.933540011973405
## ... ... ...
## WBGene00010963 0.916139865263089 0.954440625434673
## WBGene00010964 0.374448352556327 0.544673615403436
## WBGene00010965 0.904021009760152 0.946326013043526
## WBGene00010966 0.0185068216532182 0.0590552767545967
## WBGene00010967 3.18855275260321e-05 0.000347870583450949
#MA plot
plotMA(resLFC, ylim=c(-2,2))
#contrast with the same plot but for the unshrunken log fold changes
plotMA(res, ylim=c(-2,2))
#plot top few genes
resLFCOrdered <- resLFC[order(resLFC$pvalue),]
resLFCOrdered[1:10,]
## log2 fold change (MAP): condition neurexin vs control
## Wald test p-value: condition neurexin vs control
## DataFrame with 10 rows and 5 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## WBGene00008862 222.57019613529 5.57679743055826 0.289049887335795
## WBGene00006541 816.161213344128 -1.93485791500237 0.102347519609402
## WBGene00016627 533.260899008354 -2.19269946224634 0.118338343000403
## WBGene00000657 369.382836201042 2.89687520175153 0.167550806546769
## WBGene00000712 519.893685556946 2.72103817505864 0.171840363835382
## WBGene00020218 263.048181538241 -2.09074236696147 0.132190159983885
## WBGene00016453 332.089024007193 -1.9220886364233 0.124163319245512
## WBGene00000703 450.872355408933 2.65300765292485 0.174067247594477
## WBGene00018393 711.229164789001 -1.68439569805028 0.117303671006414
## WBGene00022644 337.023507140299 1.58598181801307 0.115549105174495
## pvalue padj
## <numeric> <numeric>
## WBGene00008862 3.35503734507776e-83 4.47293578845766e-79
## WBGene00006541 6.37282902848922e-81 4.24812783039092e-77
## WBGene00016627 6.63788957158305e-78 2.94987812561151e-74
## WBGene00000657 2.60644403051886e-68 8.68727795371936e-65
## WBGene00000712 7.67188409824923e-58 2.04563117595718e-54
## WBGene00020218 1.40135503067076e-57 3.11381087815043e-54
## WBGene00016453 2.85659809132675e-55 5.44059510765261e-52
## WBGene00000703 7.94087824248563e-54 1.32334735911023e-50
## WBGene00018393 6.21843170000284e-48 9.21157015827087e-45
## WBGene00022644 5.18169923397144e-44 6.90824141873072e-41
rm(resLFC)
for (i in 1:10) {
plotCounts(dds, gene=rownames(resLFCOrdered)[i], intgroup="condition")
}
#devtools::install_github("rstudio/d3heatmap") #doesn't install otherwise
#BiocManager::install("pcaExplorer")
library(pcaExplorer)
#need a mapping from Ensembl ids to gene names; will use a t2g subset; row names must map gene ids
t2g.sub <- t2g[,c("ens_gene", "ext_gene")]
t2g.sub<- dplyr::distinct(t2g.sub)
rownames(t2g.sub) <- t2g.sub[,"ens_gene"]
colnames(t2g.sub) <- c("ens_gene", "gene_name")
#pcaExplorer(dds = dds, dst = vsd, annotation= t2g.sub)
#Conclusion: Good for quick exploration but the output downloaded plot for loadings did not agree with the one shown online so I'm not sure how buggy it is....(on another trial it worked...)
#Will explore loadings outside this...
Alt text
#BiocManager::install('PCAtools')
library(PCAtools)
## Loading required package: reshape2
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## Loading required package: lattice
## Loading required package: cowplot
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
##
## Attaching package: 'PCAtools'
## The following objects are masked from 'package:stats':
##
## biplot, screeplot
p<- pca(assay(vsd), metadata=coldata)
screeplot(p)
## Warning: Removed 2 row(s) containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_point).
biplot(p, colby="condition")
pairsplot(p, colby= "condition")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
plotloadings(p)
## -- variables retained:
## WBGene00008862, WBGene00020218, WBGene00006541, WBGene00016627, WBGene00021468, WBGene00000716, WBGene00000656, WBGene00195017, WBGene00195016, WBGene00010314, WBGene00271656, WBGene00010965, WBGene00045168, WBGene00077652, WBGene00016534, WBGene00013699, WBGene00016457, WBGene00010215, WBGene00001789, WBGene00044922, WBGene00017661, WBGene00009523, WBGene00011345, WBGene00011974, WBGene00022240, WBGene00303364, WBGene00009129
loadings(p)[which.max(abs(loadings(p)$PC1)),]
## PC1 PC2 PC3 PC4 PC5
## WBGene00008862 0.09067169 -0.008180751 0.03666657 0.003266629 -0.03149979
## PC6 PC7 PC8
## WBGene00008862 -0.000739828 0.0002908326 0.002464358
loadings(p)[which.max(abs(loadings(p)$PC2)),]
## PC1 PC2 PC3 PC4 PC5
## WBGene00010965 -0.0002111257 -0.07537175 -0.00877533 -0.006426331 -0.0006274245
## PC6 PC7 PC8
## WBGene00010965 0.004823226 -0.005803499 -0.009104016
Observations from PCA loadings plot:
Aine's work showed a huge number of reads mapping to rRNA. I've decided it's probably best to remove them BEFORE proceeding with anything else so we don't have to worry about them later. I saw that many people use bwa or bowtie2 for this but rRNAs of eukaryotes can also have introns so would need a spliced aligner. Other solutions; BBsplit from BBmap can split reads from different references but does it based on k-mers. SortMeRNA also removes reads that map to a set of sequences (used mostly with prokaryotes and metagenomes..would need to check if it works well with eukaryotes).
I decided in the end that the easiest solution might be to rebuild the index that kallisto uses and re-run kallisto so that the rRNAs are not part of the reference transcriptome, allowing the reads to remain unmapped (hopefully!).
We need to remove all sequences with rRNA transcript_biotype reads (there are 23) - the genome/transcriptome files are from Ensembl and can be found in this directory: /d/in16/u/ubcg71a/research/genomes/c.elegans
zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep biotype:rRNA > list_of_biotype_rRNA_transcripts.txt
Found a one-liner online to do this easily (basically, stitches the lines together separated by "", greps and then separates again with newlines:
zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | awk '{ if ((NR>1)&&($0~/^>/)) { printf("\n%s", $0); } else if (NR==1) { printf("%s", $0); } else { printf("\t%s", $0); } }' | grep -vFf pattern.txt - | tr "\t" "\n" > Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa
Checking:
grep ">" Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa | wc
61428
zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep ">" | wc
61451
wc list_of_biotype_rRNA_transcripts.txt
23
Then zip the final .fa file:
gzip -9 Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa
(all transcriptome files are under: /d/in16/u/ubcg71a/research/genomes/c.elegans See README for how they were obtained):
/d/in7/s/kallisto/kallisto_linux-v0.46.1/kallisto index -i Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.index Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa.gz
This blog: http://genomespot.blogspot.com/2015/08/how-accurate-is-kallisto.html suggests that the difference between trimming before kallisto or not trimming does not make a huge difference overall.
So I did not run Trimmomatic on top of fastp as I would have done if we were mapping with STAR or similar.
Running first without pseudoalignment (which takes a lot more time and space) but with bootstrapping (this is run in the directory $PROJECT/kallisto_with_ncRNA_norRNA/):
nohup ./run_kallisto.sh ../fastp_merged/ N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4 >& run_kallisto.out &
I got initially rather low mapping rates (about 20%) and was worried I had the orientation of the reads wrong. I reversed the orientation and that gave less than 1% alignment so that was clearly not the problem. In this forum: https://github.com/pachterlab/kallisto/issues/198 there is a suggestion that the culprit is ncRNAs that are not included in the transcriptome file, which would make sense since we know that 80% of the reads are to rRNA in this case.
I re-did the run with a transcriptome that had ncRNA too. Now, alignment jumpted to 98.2% so the ncRNA missing was definitely the problem here (rRNA really..) These results were under kallisto_with_ncRNA, a directory that I eventually deleted to save space. The results without ncRNA are now renamed to kallisto_cDNA NOTE: the abundance estimates for the bootstrap are not written by default to the csv file but they can be written out using:
/d/in7/s/kallisto/kallisto_linux-v0.46.1/kallisto h5dump -o output abundance.h5
(where output is a directory where the results will be saved)
I will probably keep both versions and run sleuth with both to see whether there are any differences that affect the coding genes too.
Finally, I re-did the run with a transcriptome that had ncRNA but no rRNA. I then deleted the previous run that had ncRNA, including rRNAs to save space. So the kallisto_with_ncRNA directory has now been replaced with: kallisto_with_ncRNA_norRNA Alignment rates: p_unique: 9-10%, p_pseudoaligned: ~20%
I also did one re-run with ncRNA and no bootstrap but asking for pseudo-alignment to the genome so that we could get pseudo-bam files for visualisation: In directory: /d/in16/u/ubcg71a/research/filipe/kallisto_with_ncRNA_norRNA/pseudo_bam/
nohup ./run_kallisto_pseudobam.sh ../../fastp_merged/ N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4 >& run_kallisto.out &
This approach is following the code from: https://github.com/pachterlab/aggregationDE/blob/master/R/gene_pipeline and the procedure of aggregating p-values after transcript differential expression has been carried out to call differentially expressed genes (see paper by Yi et al: doi: 10.1186/s13059-018-1419-z).
'%!in%' <- function(x,y)!('%in%'(x,y))
setwd("~/Dropbox/work/research/Filipe_neurexin/NEW_PROCESS/")
#sample_id <- dir(file.path(".", "kallisto_cdna"))
sample_id <- dir(file.path(".", "kallisto_with_ncRNA_norRNA"))
sample_id
## [1] "N1_S5" "N2_S6" "N3_S7" "N4_S8" "SG1_S1" "SG2_S2" "SG3_S3" "SG4_S4"
#kal_dirs <- file.path(".", "kallisto_cdna", sample_id)
kal_dirs <- file.path(".", "kallisto_with_ncRNA_norRNA", sample_id)
Sleuth requires metadata information corresponding to the samples in the dataset. In this case, the metadata is very simple and was already created outside the R script. We simply need to add to this table the directories where kallisto results are saved.
s2c <- read.table("./metadata.txt", header = TRUE, stringsAsFactors=FALSE)
s2c
## sample condition
## 1 N1_S5 control
## 2 N2_S6 control
## 3 N3_S7 control
## 4 N4_S8 control
## 5 SG1_S1 neurexin
## 6 SG2_S2 neurexin
## 7 SG3_S3 neurexin
## 8 SG4_S4 neurexin
s2c <- dplyr::mutate(s2c, path = kal_dirs)
s2c
## sample condition path
## 1 N1_S5 control ./kallisto_with_ncRNA_norRNA/N1_S5
## 2 N2_S6 control ./kallisto_with_ncRNA_norRNA/N2_S6
## 3 N3_S7 control ./kallisto_with_ncRNA_norRNA/N3_S7
## 4 N4_S8 control ./kallisto_with_ncRNA_norRNA/N4_S8
## 5 SG1_S1 neurexin ./kallisto_with_ncRNA_norRNA/SG1_S1
## 6 SG2_S2 neurexin ./kallisto_with_ncRNA_norRNA/SG2_S2
## 7 SG3_S3 neurexin ./kallisto_with_ncRNA_norRNA/SG3_S3
## 8 SG4_S4 neurexin ./kallisto_with_ncRNA_norRNA/SG4_S4
We will get gene names from biomaRt to add to the transcript IDs we have from sleuth
ens <- useMart("ensembl")
attr = c("ensembl_gene_id",
"ensembl_transcript_id",
"description",
"external_gene_name",
"entrezgene_id")
celeg <- useMart("ensembl",
dataset = "celegans_gene_ensembl")
t2g <- getBM(attributes = attr,
mart = celeg)
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `filter_()` is deprecated as of dplyr 0.7.0.
## Please use `filter()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Cache found
t2g <- dplyr::rename(t2g,
target_id = ensembl_transcript_id,
ens_gene = ensembl_gene_id,
ext_gene = external_gene_name)
#### GENE-LEVEL-ANALYSIS ####
#Following code from: https://github.com/pachterlab/aggregationDE/blob/master/R/gene_pipeline.R
design <- ~ condition
so <- sleuth_prep(s2c,
full_model = design,
read_bootstrap_tpm = TRUE,
extra_bootstrap_summary=TRUE,
target_mapping = t2g,
transformation_function = function(x) log2(x + 0.5), #change the natural log fold change to log2 fold change
aggregation_column = 'ens_gene'
#filter_target_id=txfilter
)
## Warning in check_num_cores(num_cores): It appears that you are running Sleuth from within Rstudio.
## Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
## If you wish to take advantage of multiple cores, please consider running sleuth from the command line.
## reading in kallisto results
## dropping unused factor levels
## ........
## normalizing est_counts
## 15845 targets passed the filter
## normalizing tpm
## merging in metadata
## summarizing bootstraps
## ........
so <- sleuth_fit(so, ~condition, 'full')
## fitting measurement error models
## shrinkage estimation
## 4 NA values were found during variance shrinkage estimation due to mean observation values outside of the range used for the LOESS fit.
## The LOESS fit will be repeated using exact computation of the fitted surface to extrapolate the missing values.
## These are the target ids with NA values: T23E7.4a.2, C04F6.1.1, C42D8.2a.1, K07H8.6a.1
## computing variance of betas
#do a Wald test first
so <- sleuth_wt(so, which_beta='conditionneurexin', which_model='full')
sleuth_table_wt <- sleuth_results(so,
test= 'conditionneurexin',
test_type = 'wt',
which_model = 'full')
sleuth_table_wt[1:5,]
## target_id description
## 1 WBGene00016627
## 2 WBGene00008862
## 3 WBGene00020218
## 4 WBGene00016453 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5 WBGene00000657 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1 C44B7.5 174124 1 6.017851
## 2 F15D4.5 174969 1 4.148146
## 3 T04G9.7 180424 1 5.336970
## 4 vet-2 172993 1 5.606577
## 5 col-81 174686 1 5.464653
## pval qval
## 1 7.057950e-85 8.001598e-81
## 2 1.613360e-81 9.145333e-78
## 3 1.576815e-60 5.958785e-57
## 4 1.408001e-49 3.990628e-46
## 5 3.739522e-46 8.478992e-43
sleuth_significant_wt <- dplyr::filter(sleuth_table_wt, qval <= 0.05)
dim(sleuth_significant_wt)
## [1] 2339 8
head(sleuth_significant_wt, 20)
## target_id
## 1 WBGene00016627
## 2 WBGene00008862
## 3 WBGene00020218
## 4 WBGene00016453
## 5 WBGene00000657
## 6 WBGene00022644
## 7 WBGene00012557
## 8 WBGene00003977
## 9 WBGene00010158
## 10 WBGene00010808
## 11 WBGene00007196
## 12 WBGene00018605
## 13 WBGene00017648
## 14 WBGene00000301
## 15 WBGene00018138
## 16 WBGene00009897
## 17 WBGene00018393
## 18 WBGene00021005
## 19 WBGene00011432
## 20 WBGene00000703
## description
## 1
## 2
## 3
## 4 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7
## 8 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 10
## 11
## 12
## 13 D-aspartate oxidase 3 [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 14 Caveolin-1 [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 15 Folate-like transporter 2 [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 16 Peroxidase skpo-1 [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 17 Methionine Sulfoxide Reductase A [Source:UniProtKB/TrEMBL;Acc:O02089]
## 18 Uterine Lumin Expressed/locailized [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 19 SKN-1 Dependent Zygotic transcript [Source:UniProtKB/TrEMBL;Acc:O02297]
## 20 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q21556]
## ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1 C44B7.5 174124 1 6.017851
## 2 F15D4.5 174969 1 4.148146
## 3 T04G9.7 180424 1 5.336970
## 4 vet-2 172993 1 5.606577
## 5 col-81 174686 1 5.464653
## 6 dod-19 178564 1 5.680775
## 7 Y37D8A.19 176711 1 6.422207
## 8 pes-2.1 173025 1 4.532240
## 9 pes-2.2 173026 1 4.532240
## 10 sepa-1 173196 1 4.525542
## 11 B0513.4 178351 1 5.248247
## 12 F48E3.4 181006 1 6.154206
## 13 ddo-3 184746 1 5.835642
## 14 cav-1 177815 1 5.226113
## 15 folt-2 178745 1 7.013395
## 16 skpo-1 174340 1 6.620497
## 17 msra-1 185709 2 11.335050
## 18 ule-1 171778 1 6.022399
## 19 sdz-30 173198 1 4.018878
## 20 col-129 178155 1 5.713564
## pval qval
## 1 7.057950e-85 8.001598e-81
## 2 1.613360e-81 9.145333e-78
## 3 1.576815e-60 5.958785e-57
## 4 1.408001e-49 3.990628e-46
## 5 3.739522e-46 8.478992e-43
## 6 6.131492e-41 1.158545e-37
## 7 3.821684e-39 6.189491e-36
## 8 4.000704e-38 5.039554e-35
## 9 4.000704e-38 5.039554e-35
## 10 8.901306e-38 1.009141e-34
## 11 1.780045e-37 1.834579e-34
## 12 1.672903e-36 1.580475e-33
## 13 6.915372e-36 6.030737e-33
## 14 1.116744e-33 9.043235e-31
## 15 3.081094e-32 2.328690e-29
## 16 5.041138e-31 3.571961e-28
## 17 3.562778e-30 2.375954e-27
## 18 2.273631e-29 1.432008e-26
## 19 2.532942e-29 1.511367e-26
## 20 5.864490e-29 3.324286e-26
#and then a LRT test
so <- sleuth_fit(so, ~1, 'null')
## fitting measurement error models
## shrinkage estimation
## 4 NA values were found during variance shrinkage estimation due to mean observation values outside of the range used for the LOESS fit.
## The LOESS fit will be repeated using exact computation of the fitted surface to extrapolate the missing values.
## These are the target ids with NA values: T23E7.4a.2, C04F6.1.1, C42D8.2a.1, K07H8.6a.1
## computing variance of betas
so <- sleuth_lrt(so, 'null', 'full')
sleuth_table_lrt <- sleuth_results(so, test='null:full', 'lrt')
sleuth_table_lrt[1:5,]
## target_id
## 1 WBGene00008862
## 2 WBGene00018393
## 3 WBGene00016627
## 4 WBGene00020218
## 5 WBGene00006436
## description
## 1
## 2 Methionine Sulfoxide Reductase A [Source:UniProtKB/TrEMBL;Acc:O02089]
## 3
## 4
## 5 Titin homolog [Source:UniProtKB/Swiss-Prot;Acc:G4SLH0]
## ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1 F15D4.5 174969 1 4.148146
## 2 msra-1 185709 2 11.335050
## 3 C44B7.5 174124 1 6.017851
## 4 T04G9.7 180424 1 5.336970
## 5 ttn-1 266969 15 51.353311
## pval qval
## 1 3.530000e-09 2.000981e-05
## 2 2.804708e-09 2.000981e-05
## 3 5.541369e-09 2.094083e-05
## 4 8.194098e-09 2.322412e-05
## 5 1.147580e-08 2.602022e-05
sleuth_significant_lrt <- dplyr::filter(sleuth_table_lrt, qval <= 0.05)
dim(sleuth_significant_lrt)
## [1] 2251 8
head(sleuth_significant_lrt, 20)
## target_id
## 1 WBGene00008862
## 2 WBGene00018393
## 3 WBGene00016627
## 4 WBGene00020218
## 5 WBGene00006436
## 6 WBGene00016453
## 7 WBGene00013567
## 8 WBGene00022644
## 9 WBGene00009007
## 10 WBGene00009897
## 11 WBGene00018602
## 12 WBGene00000657
## 13 WBGene00007196
## 14 WBGene00017648
## 15 WBGene00010808
## 16 WBGene00000301
## 17 WBGene00003977
## 18 WBGene00010158
## 19 WBGene00010204
## 20 WBGene00010822
## description
## 1
## 2 Methionine Sulfoxide Reductase A [Source:UniProtKB/TrEMBL;Acc:O02089]
## 3
## 4
## 5 Titin homolog [Source:UniProtKB/Swiss-Prot;Acc:G4SLH0]
## 6 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 7
## 8 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 9 OTUBain deubiquitylating protease homolog [Source:UniProtKB/TrEMBL;Acc:Q19681]
## 10 Peroxidase skpo-1 [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 11
## 12 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 13
## 14 D-aspartate oxidase 3 [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 15
## 16 Caveolin-1 [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 17 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 18 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 19
## 20
## ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1 F15D4.5 174969 1 4.148146
## 2 msra-1 185709 2 11.335050
## 3 C44B7.5 174124 1 6.017851
## 4 T04G9.7 180424 1 5.336970
## 5 ttn-1 266969 15 51.353311
## 6 vet-2 172993 1 5.606577
## 7 Y75B12B.1 190716 3 16.170090
## 8 dod-19 178564 1 5.680775
## 9 otub-3 177679 3 16.524906
## 10 skpo-1 174340 1 6.620497
## 11 F48D6.4 180697 3 10.502658
## 12 col-81 174686 1 5.464653
## 13 B0513.4 178351 1 5.248247
## 14 ddo-3 184746 1 5.835642
## 15 sepa-1 173196 1 4.525542
## 16 cav-1 177815 1 5.226113
## 17 pes-2.1 173025 1 4.532240
## 18 pes-2.2 173026 1 4.532240
## 19 F57F5.1 179645 1 7.583655
## 20 M01G12.9 187385 1 3.599120
## pval qval
## 1 3.530000e-09 2.000981e-05
## 2 2.804708e-09 2.000981e-05
## 3 5.541369e-09 2.094083e-05
## 4 8.194098e-09 2.322412e-05
## 5 1.147580e-08 2.602022e-05
## 6 1.688221e-08 3.189893e-05
## 7 2.360899e-08 3.681739e-05
## 8 2.598034e-08 3.681739e-05
## 9 3.417326e-08 4.304692e-05
## 10 4.835407e-08 5.481901e-05
## 11 8.112164e-08 8.360691e-05
## 12 1.101088e-07 9.775006e-05
## 13 1.143796e-07 9.775006e-05
## 14 1.207110e-07 9.775006e-05
## 15 1.401189e-07 1.059018e-04
## 16 2.276656e-07 1.206595e-04
## 17 2.447886e-07 1.206595e-04
## 18 2.447886e-07 1.206595e-04
## 19 2.350286e-07 1.206595e-04
## 20 2.052485e-07 1.206595e-04
#Note: The Wald test usually returns more hits than the likelihood ratio test
#check counts for top significant genes in WT
selected_gene <- sleuth_significant_wt$target_id[1]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")
selected_gene <- sleuth_significant_wt$target_id[2]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")
selected_gene <- sleuth_significant_wt$target_id[3]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")
#save the gene differential expression results so they can be used with web servers for further analysis
#check the table has no duplicate entries
length(unique(sleuth_table_wt$target_id)) == length(sleuth_table_wt$target_id)
## [1] TRUE
write.table(sleuth_significant_wt$target_id, file="results/sleuth_significant_q005_NEW.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")
#avoid using this - pointless
write.table(dplyr::filter(sleuth_table_wt, qval > 0.05)$target_id, file="results/bg_genes_good.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")
#use this instead
write.table(dplyr::filter(sleuth_table_wt, !is.na(qval))$target_id, file="results/bg_genes_all.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")
##### TRANSCRIPT-LEVEL ANALYSIS #####
#To do the transcript-level analysis, simply set the pval_aggregate to FALSE
#NOTE that target_id will now be the transcript id rather than the gene id
#We will use the Wald test from here on
sleuth_table_txn <- sleuth_results(so,
test = 'conditionneurexin',
test_type = 'wt',
which_model = 'full',
show_all = TRUE,
pval_aggregate = FALSE)
sleuth_table_txn[1:5,]
## ens_gene target_id
## 1 WBGene00016627 C44B7.5.1
## 2 WBGene00008862 F15D4.5.1
## 3 WBGene00020218 T04G9.7.1
## 4 WBGene00016453 C35E7.1a.1
## 5 WBGene00000657 F38A3.1.1
## description ext_gene
## 1 C44B7.5
## 2 F15D4.5
## 3 T04G9.7
## 4 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7] vet-2
## 5 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135] col-81
## entrezgene_id pval qval b se_b mean_obs
## 1 174124 7.057950e-85 1.118332e-80 -1.517199 0.07771507 6.017851
## 2 174969 1.613360e-81 1.278185e-77 3.874707 0.20261631 4.148146
## 3 180424 1.576815e-60 8.328212e-57 -1.449539 0.08832342 5.336970
## 4 172993 1.408001e-49 5.577445e-46 -1.328045 0.08971666 5.606577
## 5 174686 3.739522e-46 1.185054e-42 2.030342 0.14235375 5.464653
## var_obs tech_var sigma_sq smooth_sigma_sq final_sigma_sq
## 1 0.6664751 0.003078232 0.0071785109 0.009001033 0.009001033
## 2 4.3348560 0.059455577 -0.0065745514 0.022651161 0.022651161
## 3 0.6071586 0.005627500 0.0023363218 0.009974555 0.009974555
## 4 0.5104205 0.006637396 0.0009517097 0.009460763 0.009460763
## 5 1.2125366 0.006912328 0.0336168532 0.009669959 0.033616853
sleuth_significant_txn <- dplyr::filter(sleuth_table_txn, qval <= 0.05)
dim(sleuth_significant_txn)
## [1] 2263 15
head(sleuth_significant_txn, 20)
## ens_gene target_id
## 1 WBGene00016627 C44B7.5.1
## 2 WBGene00008862 F15D4.5.1
## 3 WBGene00020218 T04G9.7.1
## 4 WBGene00016453 C35E7.1a.1
## 5 WBGene00000657 F38A3.1.1
## 6 WBGene00022644 ZK6.10a.1
## 7 WBGene00012557 Y37D8A.19.1
## 8 WBGene00003977 F56G4.2.1
## 9 WBGene00010158 F56G4.3.1
## 10 WBGene00010808 M01E5.6.1
## 11 WBGene00007196 B0513.4a.1
## 12 WBGene00018605 F48E3.4.1
## 13 WBGene00017648 F20H11.5.1
## 14 WBGene00000301 T13F2.8a.1
## 15 WBGene00018138 F37B4.7.1
## 16 WBGene00009897 F49E12.1.1
## 17 WBGene00021005 W03F11.1a.1
## 18 WBGene00011432 T04D3.2.1
## 19 WBGene00000703 M18.1a.1
## 20 WBGene00010204 F57F5.1.1
## description
## 1
## 2
## 3
## 4 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7
## 8 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 10
## 11
## 12
## 13 D-aspartate oxidase 3 [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 14 Caveolin-1 [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 15 Folate-like transporter 2 [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 16 Peroxidase skpo-1 [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 17 Uterine Lumin Expressed/locailized [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 18 SKN-1 Dependent Zygotic transcript [Source:UniProtKB/TrEMBL;Acc:O02297]
## 19 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q21556]
## 20
## ext_gene entrezgene_id pval qval b se_b
## 1 C44B7.5 174124 7.057950e-85 1.118332e-80 -1.5171989 0.07771507
## 2 F15D4.5 174969 1.613360e-81 1.278185e-77 3.8747068 0.20261631
## 3 T04G9.7 180424 1.576815e-60 8.328212e-57 -1.4495391 0.08832342
## 4 vet-2 172993 1.408001e-49 5.577445e-46 -1.3280453 0.08971666
## 5 col-81 174686 3.739522e-46 1.185054e-42 2.0303425 0.14235375
## 6 dod-19 178564 6.131492e-41 1.619225e-37 1.1263589 0.08406315
## 7 Y37D8A.19 176711 3.821684e-39 8.650655e-36 -0.9832194 0.07511980
## 8 pes-2.1 173025 4.000704e-38 7.043462e-35 -1.7272661 0.13380245
## 9 pes-2.2 173026 4.000704e-38 7.043462e-35 -1.7272661 0.13380245
## 10 sepa-1 173196 8.901306e-38 1.410412e-34 -1.5304891 0.11912883
## 11 B0513.4 178351 1.780045e-37 2.564074e-34 -1.1603081 0.09069434
## 12 F48E3.4 181006 1.672903e-36 2.208929e-33 -0.9861263 0.07815018
## 13 ddo-3 184746 6.915372e-36 8.428775e-33 -0.9699391 0.07755740
## 14 cav-1 177815 1.116744e-33 1.263915e-30 -1.1029848 0.09119031
## 15 folt-2 178745 3.081094e-32 3.254662e-29 -0.8779173 0.07427461
## 16 skpo-1 174340 5.041138e-31 4.992302e-28 -0.8518164 0.07354191
## 17 ule-1 171778 2.273631e-29 2.119158e-26 -0.8533348 0.07584102
## 18 sdz-30 173198 2.532942e-29 2.229693e-26 -1.8614224 0.16557609
## 19 col-129 178155 5.864490e-29 4.890676e-26 1.8856724 0.16884974
## 20 F57F5.1 179645 2.536609e-28 2.009629e-25 -0.8851310 0.08019750
## mean_obs var_obs tech_var sigma_sq smooth_sigma_sq final_sigma_sq
## 1 6.017851 0.6664751 0.0030782318 7.178511e-03 0.009001033 0.009001033
## 2 4.148146 4.3348560 0.0594555773 -6.574551e-03 0.022651161 0.022651161
## 3 5.336970 0.6071586 0.0056274997 2.336322e-03 0.009974555 0.009974555
## 4 5.606577 0.5104205 0.0066373958 9.517097e-04 0.009460763 0.009460763
## 5 5.464653 1.2125366 0.0069123283 3.361685e-02 0.009669959 0.033616853
## 6 5.680775 0.3666408 0.0047652695 8.746247e-05 0.009367957 0.009367957
## 7 6.422207 0.2858795 0.0016874835 9.598486e-03 0.009127380 0.009598486
## 8 4.532240 0.8831048 0.0075861955 2.822000e-02 0.014554710 0.028219996
## 9 4.532240 0.8831048 0.0075861955 2.822000e-02 0.014554710 0.028219996
## 10 4.525542 0.6886811 0.0137410161 8.921226e-03 0.014642339 0.014642339
## 11 5.248247 0.3947901 0.0061483985 5.668441e-03 0.010302527 0.010302527
## 12 6.154206 0.2883113 0.0021870626 1.002784e-02 0.008995154 0.010027837
## 13 5.835642 0.2757498 0.0028869321 5.227220e-03 0.009143370 0.009143370
## 14 5.226113 0.3589810 0.0062272018 7.058848e-03 0.010404143 0.010404143
## 15 7.013395 0.2272664 0.0009177897 7.313406e-03 0.010115646 0.010115646
## 16 6.620497 0.2086972 0.0014829410 1.334146e-04 0.009333883 0.009333883
## 17 6.022399 0.2177003 0.0025040739 8.752858e-03 0.008999647 0.008999647
## 18 4.018878 1.0323351 0.0278325853 2.159394e-02 0.026998299 0.026998299
## 19 5.713564 1.0648062 0.0052853793 5.173509e-02 0.009324889 0.051735089
## 20 7.583655 0.2299978 0.0005303502 6.648096e-03 0.012332928 0.012332928
write.table(sleuth_significant_txn$target_id, file="results/sleuth_significant_TXN_q005.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")
#the transcript table contains beta values too so one can talk about up and down regulated transcripts and genes
sleuth_significant_txn.up <- dplyr::filter(sleuth_table_txn, (qval <= 0.05) & (b>0))
head(sleuth_significant_txn.up, 20)
## ens_gene target_id
## 1 WBGene00008862 F15D4.5.1
## 2 WBGene00000657 F38A3.1.1
## 3 WBGene00022644 ZK6.10a.1
## 4 WBGene00000703 M18.1a.1
## 5 WBGene00017186 F07B7.2.1
## 6 WBGene00021107 W09B7.2.1
## 7 WBGene00000712 F41F3.4.1
## 8 WBGene00001399 F10D2.9.1
## 9 WBGene00017071 D2096.3.1
## 10 WBGene00000998 K07E12.1a.1
## 11 WBGene00019617 K10C2.1.1
## 12 WBGene00010605 K06G5.1a.3
## 13 WBGene00001758 Y45G12C.2a.1
## 14 WBGene00012445 Y16B4A.2.1
## 15 WBGene00010822 M01G12.9.1
## 16 WBGene00020128 R193.2.1
## 17 WBGene00012018 T25C12.3.1
## 18 WBGene00022497 Y119D3B.21.1
## 19 WBGene00003515 K12F2.1.1
## 20 WBGene00021050 W05H9.3
## description
## 1
## 2 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 3 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 4 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q21556]
## 5
## 6
## 7 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20282]
## 8 Delta(9)-fatty-acid desaturase fat-7 [Source:UniProtKB/Swiss-Prot;Acc:G5EGH6]
## 9 Acid Alpha Glucosidase Relate [Source:UniProtKB/TrEMBL;Acc:Q19004]
## 10 Mesocentin [Source:UniProtKB/Swiss-Prot;Acc:Q09165]
## 11 Carboxypeptidase [Source:UniProtKB/TrEMBL;Acc:Q94269]
## 12
## 13 Glutathione S-transferase P 10 [Source:UniProtKB/Swiss-Prot;Acc:Q9N4X8]
## 14 Carboxypeptidase [Source:UniProtKB/TrEMBL;Acc:K8ESM2]
## 15
## 16
## 17
## 18
## 19 Myosin-3 [Source:UniProtKB/Swiss-Prot;Acc:P12844]
## 20
## ext_gene entrezgene_id pval qval b se_b
## 1 F15D4.5 174969 1.613360e-81 1.278185e-77 3.8747068 0.20261631
## 2 col-81 174686 3.739522e-46 1.185054e-42 2.0303425 0.14235375
## 3 dod-19 178564 6.131492e-41 1.619225e-37 1.1263589 0.08406315
## 4 col-129 178155 5.864490e-29 4.890676e-26 1.8856724 0.16884974
## 5 F07B7.2 179263 3.689346e-28 2.657167e-25 3.6706694 0.33360111
## 6 W09B7.2 189307 3.689346e-28 2.657167e-25 3.6706694 0.33360111
## 7 col-139 178857 5.933925e-28 4.087959e-25 1.9426318 0.17724348
## 8 fat-7 179100 7.520251e-28 4.964932e-25 0.8585808 0.07848960
## 9 aagr-1 177632 2.947247e-26 1.796120e-23 0.8908323 0.08403264
## 10 dig-1 175951 5.555927e-24 2.839795e-21 1.0040401 0.09941544
## 11 K10C2.1 180915 6.912340e-24 3.422688e-21 0.7501839 0.07443783
## 12 K06G5.1 181531 2.726013e-23 1.308899e-20 0.8157480 0.08204864
## 13 gst-10 178725 1.757955e-22 8.192588e-20 0.9915736 0.10164890
## 14 Y16B4A.2 181572 1.881803e-22 8.519191e-20 0.7677345 0.07875834
## 15 M01G12.9 187385 5.564544e-22 2.449172e-19 2.1503128 0.22312457
## 16 R193.2 182256 1.774988e-21 7.601264e-19 0.8780899 0.09226115
## 17 T25C12.3 259727 2.854599e-21 1.190293e-18 0.7502703 0.07924338
## 18 Y119D3B.21 246006 2.328616e-20 8.999250e-18 0.7831914 0.08470518
## 19 myo-3 179676 9.198664e-20 3.389601e-17 1.0465034 0.11502529
## 20 W05H9.3 <NA> 1.125043e-19 3.961402e-17 0.9587441 0.10563358
## mean_obs var_obs tech_var sigma_sq smooth_sigma_sq final_sigma_sq
## 1 4.148146 4.3348560 0.0594555773 -6.574551e-03 0.022651161 0.022651161
## 2 5.464653 1.2125366 0.0069123283 3.361685e-02 0.009669959 0.033616853
## 3 5.680775 0.3666408 0.0047652695 8.746247e-05 0.009367957 0.009367957
## 4 5.713564 1.0648062 0.0052853793 5.173509e-02 0.009324889 0.051735089
## 5 3.152944 4.0404434 0.0876218665 1.349575e-01 0.101478562 0.134957538
## 6 3.152944 4.0404434 0.0876218665 1.349575e-01 0.101478562 0.134957538
## 7 5.842900 1.1320885 0.0042960948 5.853441e-02 0.009134034 0.058534409
## 8 6.251220 0.2211785 0.0024208098 9.900426e-03 0.009027147 0.009900426
## 9 5.704640 0.2334873 0.0047861013 3.088382e-03 0.009336868 0.009336868
## 10 6.277316 0.3049706 0.0101260524 9.640806e-03 0.009039492 0.009640806
## 11 6.284922 0.1665729 0.0020386466 4.704482e-03 0.009043333 0.009043333
## 12 6.586667 0.1928418 0.0041737452 -1.006586e-03 0.009290213 0.009290213
## 13 5.038836 0.2986324 0.0081053606 1.255964e-02 0.011227695 0.012559638
## 14 5.856078 0.1750457 0.0032878082 4.460140e-03 0.009117944 0.009117944
## 15 3.599120 1.3319882 0.0494310278 -3.672657e-02 0.050138118 0.050138118
## 16 5.217482 0.2237042 0.0065780359 -2.603720e-03 0.010446203 0.010446203
## 17 7.117130 0.1715950 0.0008316071 1.172742e-02 0.010414013 0.011727418
## 18 6.711261 0.1875539 0.0013701820 1.297975e-02 0.009468767 0.012979752
## 19 5.960508 0.3355869 0.0033371772 2.312446e-02 0.009026752 0.023124459
## 20 5.586875 0.2817546 0.0042322920 1.808462e-02 0.009485827 0.018084615
#we will save the gene ids instead of transcript ones
write.table(unique(sleuth_significant_txn.up$ens_gene), file="results/sleuth_significant_q005_up.txt", quote=FALSE, row.names = FALSE, col.names = "#ens_gene")
sleuth_significant_txn.down <- dplyr::filter(sleuth_table_txn, (qval <= 0.05) & (b<0))
head(sleuth_significant_txn.down, 20)
## ens_gene target_id
## 1 WBGene00016627 C44B7.5.1
## 2 WBGene00020218 T04G9.7.1
## 3 WBGene00016453 C35E7.1a.1
## 4 WBGene00012557 Y37D8A.19.1
## 5 WBGene00003977 F56G4.2.1
## 6 WBGene00010158 F56G4.3.1
## 7 WBGene00010808 M01E5.6.1
## 8 WBGene00007196 B0513.4a.1
## 9 WBGene00018605 F48E3.4.1
## 10 WBGene00017648 F20H11.5.1
## 11 WBGene00000301 T13F2.8a.1
## 12 WBGene00018138 F37B4.7.1
## 13 WBGene00009897 F49E12.1.1
## 14 WBGene00021005 W03F11.1a.1
## 15 WBGene00011432 T04D3.2.1
## 16 WBGene00010204 F57F5.1.1
## 17 WBGene00018393 F43E2.5.2
## 18 WBGene00017490 F15E11.1.1
## 19 WBGene00017500 F15E11.14.1
## 20 WBGene00000785 W07B8.5.1
## description
## 1
## 2
## 3 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 4
## 5 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 6 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 7
## 8
## 9
## 10 D-aspartate oxidase 3 [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 11 Caveolin-1 [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 12 Folate-like transporter 2 [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 13 Peroxidase skpo-1 [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 14 Uterine Lumin Expressed/locailized [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 15 SKN-1 Dependent Zygotic transcript [Source:UniProtKB/TrEMBL;Acc:O02297]
## 16
## 17 Methionine Sulfoxide Reductase A [Source:UniProtKB/TrEMBL;Acc:O02089]
## 18 Protein Up-regulated in Daf-2(Gf) [Source:UniProtKB/TrEMBL;Acc:G5EBF3]
## 19 Protein Up-regulated in Daf-2(Gf) [Source:UniProtKB/TrEMBL;Acc:G5EBF3]
## 20 Cathepsin B-like cysteine proteinase 5 [Source:UniProtKB/Swiss-Prot;Acc:P43509]
## ext_gene entrezgene_id pval qval b se_b
## 1 C44B7.5 174124 7.057950e-85 1.118332e-80 -1.5171989 0.07771507
## 2 T04G9.7 180424 1.576815e-60 8.328212e-57 -1.4495391 0.08832342
## 3 vet-2 172993 1.408001e-49 5.577445e-46 -1.3280453 0.08971666
## 4 Y37D8A.19 176711 3.821684e-39 8.650655e-36 -0.9832194 0.07511980
## 5 pes-2.1 173025 4.000704e-38 7.043462e-35 -1.7272661 0.13380245
## 6 pes-2.2 173026 4.000704e-38 7.043462e-35 -1.7272661 0.13380245
## 7 sepa-1 173196 8.901306e-38 1.410412e-34 -1.5304891 0.11912883
## 8 B0513.4 178351 1.780045e-37 2.564074e-34 -1.1603081 0.09069434
## 9 F48E3.4 181006 1.672903e-36 2.208929e-33 -0.9861263 0.07815018
## 10 ddo-3 184746 6.915372e-36 8.428775e-33 -0.9699391 0.07755740
## 11 cav-1 177815 1.116744e-33 1.263915e-30 -1.1029848 0.09119031
## 12 folt-2 178745 3.081094e-32 3.254662e-29 -0.8779173 0.07427461
## 13 skpo-1 174340 5.041138e-31 4.992302e-28 -0.8518164 0.07354191
## 14 ule-1 171778 2.273631e-29 2.119158e-26 -0.8533348 0.07584102
## 15 sdz-30 173198 2.532942e-29 2.229693e-26 -1.8614224 0.16557609
## 16 F57F5.1 179645 2.536609e-28 2.009629e-25 -0.8851310 0.08019750
## 17 msra-1 185709 4.221976e-27 2.675888e-24 -1.3144354 0.12191886
## 18 pud-2.1 178713 2.139720e-25 1.210852e-22 -0.8263971 0.07935422
## 19 pud-2.2 178710 2.139720e-25 1.210852e-22 -0.8263971 0.07935422
## 20 cpr-5 178612 2.938093e-25 1.605313e-22 -0.8746309 0.08423025
## mean_obs var_obs tech_var sigma_sq smooth_sigma_sq final_sigma_sq
## 1 6.017851 0.6664751 0.0030782318 0.0071785109 0.009001033 0.009001033
## 2 5.336970 0.6071586 0.0056274997 0.0023363218 0.009974555 0.009974555
## 3 5.606577 0.5104205 0.0066373958 0.0009517097 0.009460763 0.009460763
## 4 6.422207 0.2858795 0.0016874835 0.0095984858 0.009127380 0.009598486
## 5 4.532240 0.8831048 0.0075861955 0.0282199959 0.014554710 0.028219996
## 6 4.532240 0.8831048 0.0075861955 0.0282199959 0.014554710 0.028219996
## 7 4.525542 0.6886811 0.0137410161 0.0089212257 0.014642339 0.014642339
## 8 5.248247 0.3947901 0.0061483985 0.0056684408 0.010302527 0.010302527
## 9 6.154206 0.2883113 0.0021870626 0.0100278374 0.008995154 0.010027837
## 10 5.835642 0.2757498 0.0028869321 0.0052272197 0.009143370 0.009143370
## 11 5.226113 0.3589810 0.0062272018 0.0070588477 0.010404143 0.010404143
## 12 7.013395 0.2272664 0.0009177897 0.0073134062 0.010115646 0.010115646
## 13 6.620497 0.2086972 0.0014829410 0.0001334146 0.009333883 0.009333883
## 14 6.022399 0.2177003 0.0025040739 0.0087528577 0.008999647 0.008999647
## 15 4.018878 1.0323351 0.0278325853 0.0215939443 0.026998299 0.026998299
## 16 7.583655 0.2299978 0.0005303502 0.0066480959 0.012332928 0.012332928
## 17 5.939860 0.5161979 0.0206884267 0.0056289443 0.009039990 0.009039990
## 18 5.524850 0.2059185 0.0023758220 0.0102183621 0.009571054 0.010218362
## 19 5.524850 0.2059185 0.0023758220 0.0102183621 0.009571054 0.010218362
## 20 6.393223 0.2307279 0.0019802600 0.0122092099 0.009107001 0.012209210
write.table(unique(sleuth_significant_txn.down$ens_gene), file="results/sleuth_significant_q005_down.txt", quote=FALSE, row.names = FALSE, col.names = "#ens_gene")
#Summarise some of the numbers
#all transcripts and genes considered in the analysis
length(sleuth_table_txn$target_id)
## [1] 61428
length(sleuth_table_wt$target_id)
## [1] 46904
length(unique(sleuth_table_wt$target_id))
## [1] 46904
#transcripts and genes with sleuth data
sum(sleuth_table_wt$num_aggregated_transcripts, na.rm=TRUE )
## [1] 15845
length(sleuth_table_wt[!is.na(sleuth_table_wt$num_aggregated_transcripts),] $target_id)
## [1] 11337
#significantly DE genes (aggregated) at q<0.05
length(sleuth_significant_wt$target_id)
## [1] 2339
#significantly DE transcripts
length(sleuth_significant_txn$target_id)
## [1] 2263
#Some additional exploratory analysis using sleuth's own functions
plot_pca(so,
color_by = 'condition',
text_labels = TRUE)
#note that using tpm as unit does not result in a great separation along the PC1
#the two types of samples are separated but only just (the SGs cluster together though)
plot_pca(so,
color_by = 'condition',
text_labels = TRUE, units="tpm")
plot_pca(so,
pc_x=2,
pc_y=3,
color_by = 'condition',
text_labels = TRUE)
plot_pca(so,
pc_x=2,
pc_y=8,
color_by = 'condition',
text_labels = TRUE)
#how much of the variance is accounted for by each PC?
plot_pc_variance(obj = so, units = "est_counts")
plot_pc_variance(obj = so, units = "tpm")
#note PC8 also separates the samples
plot_loadings(so, pc_input=1)
plot_loadings(so, pc_input=2)
plot_loadings(so, pc_input=3)
plot_loadings(so, pc_input=8)
#note that all influential genes in PCA seem to be very highly expressed (over 1000 tpm)
plot_group_density(so,
use_filtered = FALSE,
units = "est_counts",
trans = "log",
grouping = "condition")
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `summarise_()` is deprecated as of dplyr 0.7.0.
## Please use `summarise()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
plot_group_density(so,
use_filtered = TRUE,
units = "est_counts",
trans = "log",
grouping = "condition")
plot_sample_heatmap(so)
#check whether there are sex differences
#list of sex determining genes is from:
#https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2619291/pdf/324.pdf
list.sex_genes <- c("xol-1", "sdc-1", "sdc-2",
"her-1", "tra-2", "tra-3",
"fem-1", "fem-2", "fem-3", "tra-1")
plot_transcript_heatmap(so,
transcripts=dplyr::filter(sleuth_table_txn,
ext_gene %in% list.sex_genes)$target_id)
#repeat the plot with est_counts instead of tpm
plot_transcript_heatmap(so,
transcripts=dplyr::filter(sleuth_table_txn,
ext_gene %in% list.sex_genes)$target_id,
units="est_counts")
#ZK287.8b.1 seems to separate two of the normal samples but it's a transcript that
#is not used in differential expression due to filtering - below are original counts
#note that basic filter in sleuth filters out any transcripts with fewer than 5 estimated counts
#in 47% of the samples (effectively here half of the samples)
so$obs_raw[so$obs_norm$target_id == "ZK287.8b.1",]
## target_id est_counts eff_len len tpm sample
## 480169 ZK287.8b.1 0.000000 54.84584 196 0.00000 N1_S5
## 480170 ZK287.8b.1 3.000000 51.71815 196 11.69946 N2_S6
## 480171 ZK287.8b.1 3.825966 52.92062 196 16.27609 N3_S7
## 480172 ZK287.8b.1 0.000000 51.52067 196 0.00000 N4_S8
## 480173 ZK287.8b.1 0.000000 55.92125 196 0.00000 SG1_S1
## 480174 ZK287.8b.1 0.000000 53.04100 196 0.00000 SG2_S2
## 480175 ZK287.8b.1 0.000000 54.34783 196 0.00000 SG3_S3
## 480176 ZK287.8b.1 0.000000 53.99656 196 0.00000 SG4_S4
#volcano plot for genes
#Note: there is a problem with volcano and MA plots not setting hte p-value aggregate to FALSE when reading
#results from the WT test and thus not realising there is a b column
#Error reported is:
#Error in FUN(X[[i]], ...) : object 'b' not found
#To fix this, manually set the pval_aggregate in so before supplying to plots as suggested here:
#https://github.com/pachterlab/sleuth/issues/233
# so$pval_aggregate <- FALSE
# plot_volcano(obj=so,
# test = "conditionneurexin",
# test_type = "wt",
# which_model="full",
# sig_level = 0.05,
# sig_color = "red")
#can also do MA plot
# plot_ma(obj=so,
# test = "conditionneurexin",
# test_type = "wt",
# which_model="full",
# sig_level = 0.05,
# sig_color = "red")
# ALTERNATIVELY
#use the EnhancedVolcano package and function to make nicer plots and to also highlight
#genes easily (the highlight function in plot_volcano from sleuth is not working for me)
#not used as using cutoffs within the volcano plot options
#select.lab <- head(sleuth_significant_txn, 20)$target_id
#use the same function that plot_volcano() uses to produce the data frame needed to make the plot
toptable <- sleuth_results(so,
test="conditionneurexin",
test_type="wt",
which_model="full",
pval_aggregate=FALSE,
rename_cols=FALSE,
show_all=FALSE)
toptable <- dplyr::mutate(toptable, significant = qval < 0.05)
dim(dplyr::filter(toptable, abs(b)>1))
## [1] 2045 16
#there are 2045 transcripts
EnhancedVolcano(
toptable=toptable,
lab=toptable$target_id,
x='b',
y='qval',
xlab="beta_value",
selectLab = NULL,
title="Sleuth results (without p-value aggregation)",
subtitle="",
legend=c("NS","beta","P","P & beta"),
legendLabels=c("NS","beta","P","P & beta"),
transcriptLabSize=5.0,
pLabellingCutoff=10e-6,
FCcutoff=1.0
)
## Warning in EnhancedVolcano(toptable = toptable, lab = toptable$target_id, :
## transcriptLabSize argument deprecated in v1.4 - please use labSize
#create a table with information about top 10 most significant genes
select.res <- dplyr::filter(toptable, abs(b)>1)[1:10,c(1,2,5,6,7)]
dplyr::filter(t2g, target_id %in% select.res$target_id)
## ens_gene target_id
## 1 WBGene00000657 F38A3.1.1
## 2 WBGene00003977 F56G4.2.1
## 3 WBGene00007196 B0513.4a.1
## 4 WBGene00008862 F15D4.5.1
## 5 WBGene00010158 F56G4.3.1
## 6 WBGene00010808 M01E5.6.1
## 7 WBGene00016453 C35E7.1a.1
## 8 WBGene00016627 C44B7.5.1
## 9 WBGene00020218 T04G9.7.1
## 10 WBGene00022644 ZK6.10a.1
## description
## 1 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 2 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 3
## 4
## 5 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 6
## 7 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 8
## 9
## 10 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## ext_gene entrezgene_id
## 1 col-81 174686
## 2 pes-2.1 173025
## 3 B0513.4 178351
## 4 F15D4.5 174969
## 5 pes-2.2 173026
## 6 sepa-1 173196
## 7 vet-2 172993
## 8 C44B7.5 174124
## 9 T04G9.7 180424
## 10 dod-19 178564
merge(x=select.res, y=dplyr::filter(t2g, target_id %in% select.res$target_id), by="target_id", sort=FALSE, no.dups=TRUE)[,c("target_id","ext_gene","entrezgene_id.x","qval","description")]
## target_id ext_gene entrezgene_id.x qval
## 1 C44B7.5.1 C44B7.5 174124 1.118332e-80
## 2 F15D4.5.1 F15D4.5 174969 1.278185e-77
## 3 T04G9.7.1 T04G9.7 180424 8.328212e-57
## 4 C35E7.1a.1 vet-2 172993 5.577445e-46
## 5 F38A3.1.1 col-81 174686 1.185054e-42
## 6 ZK6.10a.1 dod-19 178564 1.619225e-37
## 7 F56G4.2.1 pes-2.1 173025 7.043462e-35
## 8 F56G4.3.1 pes-2.2 173026 7.043462e-35
## 9 M01E5.6.1 sepa-1 173196 1.410412e-34
## 10 B0513.4a.1 B0513.4 178351 2.564074e-34
## description
## 1
## 2
## 3
## 4 Very Early Transcript [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5 COLlagen [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6 Downstream Of DAF-16 (Regulated by DAF-16) [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 8 Patterned Expression Site; Pes-2 protein [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9
## 10
#Examine some bootstrapped counts for individual transcripts of the same gene
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[1], units = "est_counts", color_by = "condition")
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[2], units = "est_counts", color_by = "condition")
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[3], units = "est_counts", color_by = "condition")
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[4], units = "est_counts", color_by = "condition")
#use the shiny app when not knitting the document
#sleuth_live(so)
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] PCAtools_1.2.0 cowplot_1.0.0
## [3] lattice_0.20-41 reshape2_1.4.4
## [5] apeglm_1.8.0 pheatmap_1.0.12
## [7] DESeq2_1.26.0 SummarizedExperiment_1.16.1
## [9] DelayedArray_0.12.3 BiocParallel_1.20.1
## [11] matrixStats_0.56.0 EnhancedVolcano_1.4.0
## [13] ggrepel_0.8.2 Rgraphviz_2.30.0
## [15] topGO_2.38.1 SparseM_1.78
## [17] graph_1.64.0 org.Ce.eg.db_3.10.1
## [19] GO.db_3.10.0 goseq_1.38.0
## [21] geneLenDataBase_1.22.0 BiasedUrn_1.07
## [23] reactome.db_1.70.0 AnnotationDbi_1.48.0
## [25] Biobase_2.46.0 biomaRt_2.42.1
## [27] ggplot2_3.3.2 dplyr_1.0.0
## [29] tximport_1.14.2 sleuth_0.30.0
## [31] tidyr_1.1.0 rtracklayer_1.46.0
## [33] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
## [35] IRanges_2.20.2 S4Vectors_0.24.4
## [37] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.8 Hmisc_4.4-0 BiocFileCache_1.10.2
## [4] plyr_1.8.6 lazyeval_0.2.2 splines_3.6.3
## [7] digest_0.6.25 htmltools_0.5.0 magrittr_1.5
## [10] checkmate_2.0.0 memoise_1.1.0 cluster_2.1.0
## [13] Biostrings_2.54.0 annotate_1.64.0 askpass_1.1
## [16] bdsmatrix_1.3-4 prettyunits_1.1.1 jpeg_0.1-8.1
## [19] colorspace_1.4-1 blob_1.2.1 rappdirs_0.3.1
## [22] xfun_0.14 crayon_1.3.4 RCurl_1.98-1.2
## [25] genefilter_1.68.0 survival_3.2-3 glue_1.4.1
## [28] gtable_0.3.0 zlibbioc_1.32.0 XVector_0.26.0
## [31] BiocSingular_1.2.2 Rhdf5lib_1.8.0 scales_1.1.1
## [34] mvtnorm_1.1-1 DBI_1.1.0 Rcpp_1.0.4
## [37] xtable_1.8-4 progress_1.2.2 emdbook_1.3.12
## [40] htmlTable_2.0.0 dqrng_0.2.1 rsvd_1.0.3
## [43] foreign_0.8-75 bit_1.1-15.2 Formula_1.2-3
## [46] htmlwidgets_1.5.1 httr_1.4.1 RColorBrewer_1.1-2
## [49] acepack_1.4.1 ellipsis_0.3.1 pkgconfig_2.0.3
## [52] XML_3.99-0.3 farver_2.0.3 nnet_7.3-14
## [55] dbplyr_1.4.4 locfit_1.5-9.4 tidyselect_1.1.0
## [58] labeling_0.3 rlang_0.4.6 munsell_0.5.0
## [61] tools_3.6.3 generics_0.0.2 RSQLite_2.2.0
## [64] evaluate_0.14 stringr_1.4.0 yaml_2.2.1
## [67] knitr_1.28 bit64_0.9-7 purrr_0.3.4
## [70] nlme_3.1-148 compiler_3.6.3 rstudioapi_0.11
## [73] curl_4.3 png_0.1-7 tibble_3.0.1
## [76] geneplotter_1.64.0 stringi_1.4.6 GenomicFeatures_1.38.2
## [79] Matrix_1.2-18 vctrs_0.3.1 pillar_1.4.4
## [82] lifecycle_0.2.0 irlba_2.3.3 data.table_1.12.8
## [85] bitops_1.0-6 R6_2.4.1 latticeExtra_0.6-29
## [88] gridExtra_2.3 MASS_7.3-51.6 assertthat_0.2.1
## [91] rhdf5_2.30.1 openssl_1.4.1 withr_2.2.0
## [94] GenomicAlignments_1.22.1 Rsamtools_2.2.3 GenomeInfoDbData_1.2.2
## [97] mgcv_1.8-31 hms_0.5.3 rpart_4.1-15
## [100] coda_0.19-3 rmarkdown_2.3 DelayedMatrixStats_1.8.0
## [103] aggregation_1.0.1 bbmle_1.0.23.1 numDeriv_2016.8-1.1
## [106] base64enc_0.1-3
#save.image(file="neurexin_analysis.Rdata")